home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Mac Mania 6
/
MacMania 6.toast
/
/
Tools&Utilities
/
EnterAct Stuff
/
Documentation
/
Debugging hAWK programs
< prev
next >
Wrap
Text File
|
1992-04-05
|
17KB
|
413 lines
Debugging hAWK programs
-------------------
Introduction
It doesn't run
It doesn't do what I wanted
Common bugs
---------
Introduction
---------
Errors can creep in at the specification, design, or coding stage of any
program, in any language. Symptoms of a error can range from a vague
uneasiness about the results to seemingly random crashes. In C, one of the
most difficult tasks of debugging is to stabilize a bug so that it can be
repeated consistently; fortunately, in hAWK this isn't a problem, since it
doesn't allow writing to an arbitrary memory address. So for hAWK
programs, your tasks are to find where the bug is, and fix it.
This is mainly a guide to finding where the bug is in your source code, with a
brief list of common bugs. When it's not obvious where the bug is, your two
best weapons are; insert "print" statements to give you some idea of what's
going on, and selectively comment out lines of source to isolate the problem.
But as you gain experience writing hAWK programs you'll naturally find
yourself avoiding the common bugs, and catching the others with careful
proofreading.
Develop and test in small pieces. Follow a plan. Don't get mad, get critical.
Question everything, including this cheap advice.
---------
It doesn't run
---------
If your program doesn't start running due to a syntax error, the
"$tempStdErr" file will contain a message telling you the line number
where the error was detected. Normally this line number will be exactly
where the error is, or at most a couple of lines after the real mistake.
Most syntax errors will be easy to spot and fix—missing punctuation such
as brackets or quotes for example. One oddball error that may be difficult to
diagnose is the insidious missing "#", as in
#$Calculate: a four function calulator.
#Enter expressions using numbers and + - * /.
#If the expression is not properly formed, as in
2 + 3 * / 7.5
#then you'll get an error message.
#....
—this would produce the error message
hAWK: syntax error near line 4:
2 + 3 * / 7.5
^ parse error
in $tempStdErr. However, be thankful if you GET a message complaining
about an uncommented comment. Quite often, you'll get no message at
all—hAWK will quietly execute the comment as though it were part of your
program (more on this below - see the start of "Common bugs").
Using a hAWK key word or builtin function name as a variable name is also a
popular error—watch out especially for "in" and "length".
A carriage return or semicolon is sometimes required in a hAWK statement
to disambiguate the syntax. See the "Grouping and breaking lines" section in
the "hAWK program structure" chapter of the hAWK User's Manual for the
details (that's section I 2 in the popup marks menu for the manual).
If you really get stuck trying to diagnose a syntax error, try looking through
the sample programs supplied with hAWK for similar constructions, as well
as rereading the relevant manual sections. See also the "Common bugs" section
below.
------------------
It doesn't do what I wanted
------------------
Proofreading helpps
hAWK programs are so easy to write that there is a strong temptation to
rush. The fix for most of these problems is to carefully read through your
new code once before running it. However, we're only human....
Print power
The best way to diagnose a bug in a program is to get your program to talk to
you about what it is doing. Your most powerful debugging aid is built in to
hAWK, and goes by the name of "print". The first rule of hAWK debugging is,
Print Out What's Really Going On. Many of the suggestions below deal with
printing out diagnostics.
Make a copy
If you're debugging an amibitious program, make a copy of it and debug the
copy. By spinning off one or more versions of your program,you'll be able to
back up if you angrily delete a stupid chunk of code, only to realise later that
it was inspired and perfectly correct.
Track your changes
In a typical debugging session you will be inserting new statements on a trial
or temporary basis, and also deleting old statements. It can be difficult to
back up, and easy to get lost in a knot of conditional trials, so the second rule
of hAWK debugging is Mark Your Changes. To temporarily comment out a
statement, place "##" in front of it, rather than a single "#". If a new
statement may not be permanent, place "###" after it. Later, you can
search for your changes by looking for "##" and "###". Any variation on
this such as "#@" is perfectly fine; the goal is to be able to spot all of your
changes at any time by using your "Search" command in your editor.
You can even separate out connected changes by tagging the affected lines with
"##1" for one group, "##2" for another (the tag goes at the beginning of a
line for a delete, at the end of the line for a newly-added statement). But
this is getting a bit complicated, so do it only as a last resort.
Variable values
To check the value of a variable (say x1), you might as well keep it simple:
print x1 ###
will do the job. If you are checking many variables, something like
print "x1 =", x1, "at line 43" ###
may be called for.
Variable names
There is no such thing as an undeclared variable in hAWK. This convenience
can trip you up, however, if you accidentally misspell the name of a
variable. There will be no syntax error; the misspelled name will just be
treated as a different variable. If you suspect a spelling error is the culprit
but can't spot it, run $WordFrequency using your problem program as the
input. This will produce a list of all words in your program, making it easier
to pick out wrong spellings.
Note to have $WordFrequency skip over comments, you can uncomment
##/^#/ {next} #skip lines containing hAWK comments
just after the "BEGIN" block in it.
Assertions
Assertions are easily checked by adding an "assert" function:
function assert(expr, message, line)
{
if (!expr)
print "Assertion flunked:", message, "at", line
}
with usage such as
assert(x1 <= 50, "x1 <= 50", 43) ###
--note that this still prints properly if you leave out the line
number as in assert(x1 <= 50, "x1 <= 50") - you just won't
get the line number where the problem occurred.
Assertions the easy way
For the truly lazy, you could put your assertions in using an
abbreviated form, such as
a(x1 <= 50)...
a(max > 0 && max < 1000) etc
and then run this little hAWK program, which is in your "hAWK
programs" folder, on your problem program to fill out the
assertions (note it's been debugged):
# $ExpandAssertions : see "Debugging hAWK programs"
# (the tricky bits - treat quotes properly, and
# avoid using sub(), since the replacement string might contain
# a "&" which stands for "everything that was matched".)
#
# Pass it your problem program as the single input file:
# overwrites the file, replacing a(assertion) with
# assert(assertion, "assertion", line number) ###
FNR == 1 {outfile = FILENAME}
{ if (match($0, /[ \t]*a\((.+)\)/))
{
match($0, /\((.+)\)/) #find the argument proper
first = substr($0, RSTART+1, RLENGTH-2) #copy it
second = first
gsub(/"/, "\\\"", second) #escape quotes
match($0, /[ \t]*/) #match starting white space
starter = substr($0, RSTART, RLENGTH) #copy it
$0 = starter "assert(" first ", \"" second "\", " FNR ") ###"
##sub(/a\((.+)\)/, "assert(" first ", \"" second "\", " FNR ") ###")
##-deleted, doesn't work properly if "second" contains a "&"
}
out[++i] = $0
}
END { close(outfile)
for (j = 1; j <= i; ++j)
print out[j] > outfile
}
This will expand your abbreviations into proper assertions:
assert(x1 <= 50, "x1 <= 50", ddd) ###...
assert(max > 0 && max < 1000, "max > 0 && max < 1000",ddd) ### etc
where ddd is the line number in your program.
Add the "assert" function above to your program too!
Function flow
Tracing function flow can be done by inserting print statements
at the start and end of each function, eg:
function a_func(args...)
{
print "Entering:""a_func"
...body of function
print "Leaving:""a_func"
return something
}
though the "Leaving" statements require more care, as your function
might have several "return" statements. Often, just the "Entering"
print statements provide enough information for debugging.
Function flow the easy way
"Entering" print statements can be inserted with the following
hAWK program (once again your original program will be
overwritten, so use a copy):
#$EnteringFunction: ad debugging to a program with functions,
# inserting print statements at beginning of each function.
# Pass it your problem program as the single input file:
# overwrites the file, so use a copy.
FNR == 1 {outfile = FILENAME}
{
if (match($0, /^[ \t]*func/)) #start of function definition
{
name = $2
len = index(name, "(")
if (len+0 > 1)
name = substr(name,1,len-1)
out[++i] = $0
# Skip over opening left curly of function
if ($0 !~ /{/)
{
do
{
getline
out[++i] = $0
} while ($0 !~ /{/);
}
out[++i] = "print \"Entering: \"\"" name "\" ###"
}
else
out[++i] = $0
}
END { close(outfile)
for (j = 1; j <= i; ++j)
print out[j] > outfile
}
You'll find this program in your "hAWK programs" folder.
Sending diagostics to stderr
If your program writes to stdout and you'd rather redirect your
output to a different file to make things easier to read, then instead
of just a plain "print" for your debugging
print "Debugging or error message"
you can use
print("Debugging or error message") > "stderr"
This will send diagnostics to the file $tempStdErr. The parenthesized
form of the print statement should be used to make it clear to the
interpreter that ">" means "redirect", not "greater than".
For example, to print assertions to $tempStdErr you could use
function assert(expr, message, line)
{
if (!expr)
print ("Assertion flunked:", message, "at", line) > "stderr"
}
And to send function flow messages to stderr, replace the appropriate
line in $EnteringFunction with
out[++i] = "print( \"Entering: \"\"" name "\") > \"stderr\" ###"
The $tempStdErr file will not be opened for you automatically after a run,
so remember to open it and take a look if you send messages there.
hAWK isn't C
hAWK declares variables for you, happily concatenates just about anything
with anything, accepts functions with a variable number of arguments,
doesn't mind if you use a name as a number, a string, AND an array
all in the same program, and just loves to print the current record to
stdout unless you say otherwise. To some extent this is "too much of a
good thing", and it certainly takes getting used to. The "Common bugs"
section below is mostly a list of things that hAWK does differently from
C, and a quick browse through will help you avoid programming in the
wrong language.
----------
Common bugs
----------
Uncommented comment: if you just can't find the bug, reread your program
and look for a line that should be a comment but isn't. Forgetting to put
the "#" at the start of a comment doesn't always cause a syntax error;
often hAWK will execute the text as though it were code without error,
and if the text involves any variables you use in your program then
odd things can happen. Typical symptoms are that lines are printed to
stdout that you didn't expect, or you can't seem to set the value of a
variable.
Watch for things like
#Setting
x = -1
#will shut off all progress dialogs,
#for quiet running.
--here, "x = -1" would be interpreted as a pattern; it would always
evaluate to nonzero, so all lines of input would be printed to stdout
(the default action if no action is given).
or
x = -1; x = 0 will enable dlogs
--here, x would have the value "0", as a result of concatenating "0"
with the (presumably) unassigned variables "will", "enable", and "dlogs",
overriding the previous "x = -1" assignment. Strange, but true.
Spelling error: the drawback of not having to declare variables in hAWK is
that a spelling error can accidentally create a new variable. For example,
if (maxLines > 50)
maxlines = 50
would never set "maxLines" - it would create a new variable "maxlines"
and set it instead. If you suspect a spelling error is the culprit but can't
spot it, run $WordFrequency using your problem program as the input.
This will produce a list of all words in your program, making it easier
to pick out wrong spellings.
Unintentional redirection: "getline" returns 0 at end of file, -1 if there
is a problem reading the file, and 1 if all is OK. So what does
if(getline < 0)....
do? It attempts to read from the file named "0", and usually doesn't succeed.
If you want to check that getline is not returning -1, use
if ((getline) < 0)...
Even better, use
if(getline <= 0)....
or
while (getline > 0)....
instead.
Endless getline loop: "getline" returns 0 only if it has successfully reached
the end of a file. If there is a problem opening or reading a file, "getline"
returns -1. So,
while (getline < theFIle)....
will loop forever if it has trouble reading a file. Use
while (getline < theFile > 0)....
instead.
Unassigned variable: sometimes you may wish to safeguard against forgetting
to assign a value to a variable, or to give it a default value if no value was
ever assigned. The tests for this are:
if: then this test is true:
--------------- -----------------------
x is unassigned if (x == "" && x == 0)
x = anything if (x != 0 || x != "")
For example,
if (find == "" && find == 0)
print "Oops, forgot to set \"find\"."
More commonly, you'll just be interested in whether a variable has
a non-null value, for which the test
if (x == "")
will do.
Comparing string with number: in a comparison such as
if (x == y) or if (x >= y) etc
x and y are compared as strings unless both x and y are numbers.
To force the comparison to be done with the numeric values of the
variables, add 0 to each, as in
if (x+0 == y+0) or if (x+0 >= y+0).
Conversely, to force the comparison to be done with the string values,
concatenate at least one with the null string, as in
if (x "" == y) or if (x >= y "").
Global instead of local: if you forget to declare a local variable, or misspell
the name of a local variable, your variable will be global rather than
local. It will not be initialized to zero each time you call the function, and
if it is used elsewhere as a global then changing its value in one place will
affect the other place. $WordFrequency helps with misspellings (see above).
Function returns garbage: if you forget to return something from a function
then the returned value of the function will be garbage. If your function
should return something, check that it does so in all possible cases.
Patterns are not mutually exclusive: more of a design problem than a bug,
the problem here is to execute one action if some complicated test is true,
and do some other action otherwise. To do it, put all the complicated testing
inside one action, so you can use an "if-else" construction.
All patterns in your program are executed for each new input line, unless
you do one of:
"next", which retrieves the next input line and starts the pattern
matching over with your first pattern;
"getline", which retrieves the next input line to $0 (if no variable
is specified) but doesn't jump you back to the first pattern;
"exit" which skips to your END statements or exits immediately.
Exit doesn't exit: an "exit" inside an END block does truly and immediately
exit. An exit anywhere else passes control to your END statements if there
are any, so if you need to do an immediate exit in a program that contains
an END block use something like
...
if (should immediately exit)
{
exit_now = 1
exit
}
...
END {
if (exit_now == 1)
exit
...normal END actions
}
Regular expression matches too much: regular expressions try to match
as much as possible - for example, /A.*B/ will match everything between
the first "A" on the line and the last "B" on the line, even if there are 5 B's
in between. To match from A to the first B, use /A[^B]*B/.
Matching C identifiers: "\w" in a regular expression is the same as
"[A-Za-z0-9]" - note it doesn't include the underscore. To match a C
name, use "[A-Za-z_][A-Za-z0-9_]*" or the equivalent
"[A-Za-z_](\w|_)*". Using "[A-Za-z0-9_]+" will work provided
you don't mind catching integers such as "0" or "317" as well as names.
Missing array index: consider the fragment
x[1]= 17
x = "huh?"
x[2] = "hi"
print x[1], x[2], x
-will this run? You bet - a variable and an array can have the same name,
with no interference between the two. The above fragment will print
17 hi huh?
Needless to say, this "feature" should be avoided.